{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![LOGO](../../../img/MODIN_ver2_hrz.png)\n",
    "\n",
    "<center><h2>Scale your pandas workflows by changing one line of code</h2>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exercise 3: Not Implemented\n",
    "\n",
    "**GOAL**: Learn what happens when a function is not yet supported in Modin."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When functionality has not yet been implemented for `HdkOnNative` execution, we default to pandas as follows\n",
    "\n",
    "![](../../../img/hdk_convert_to_pandas.png)\n",
    "\n",
    "We convert the Modin dataframe to a pyarrow.Table, perform a lazy tree execution in HDK, render it as a pyarrow.Table, convert it to pandas to perform the operation, and then convert it back to Modin when complete. These operations will have a large overhead due to the communication involved and will take longer than pandas.\n",
    "\n",
    "When this is happening, a warning will be given to the user to inform them that this operation will take longer than usual. For example, `DataFrame.mask` is not supported. In this case, when a user tries to use it, they will see this warning:\n",
	"\n",
	"```\n",
	"UserWarning: `DataFrame.mask` defaulting to pandas implementation.\n",
	"```\n",
    "\n",
    "#### Relation engine limitations\n",
    "As the `HdkOnNative` execution is backed by relation algebra based DB engine, there is a certain set of limitations on operations that could be used in Modin with such an execution. For example arbitrary functions in `DataFrame.apply` are not supported as the HDK engine can't execute python callables against its tables, this means that `DataFrame.apply(python_callable)` will **always** be defaulting to pandas. \n",
    "\n",
    "For more info about `HdkOnNative` limitations visit the appropriate section on read-the-docs: [relation algebra limitations](https://modin.readthedocs.io/en/stable/flow/modin/experimental/core/execution/native/implementations/hdk_on_native/index.html#relational-engine-limitations).\n",
    "\n",
    "If your flow mainly operates with non-relational algebra operations, you should better choose non-HDK execution (for example, `PandasOnRay`)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Concept for exercise: Default to pandas\n",
    "\n",
    "In this section of the exercise we will see first-hand how the runtime is affected by operations that are not implemented."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import modin.pandas as pd\n",
    "import pandas\n",
    "import numpy as np\n",
    "import time\n",
    "import modin.config as cfg\n",
    "cfg.StorageFormat.put(\"hdk\")\n",
    "\n",
    "frame_data = np.random.randint(0, 100, size=(2**18, 2**8))\n",
    "df = pd.DataFrame(frame_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pandas_df = pandas.DataFrame(frame_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "modin_start = time.time()\n",
    "\n",
    "print(df.mask(df < 50))\n",
    "\n",
    "modin_end = time.time()\n",
    "print(\"Modin mask took {} seconds.\".format(round(modin_end - modin_start, 4)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pandas_start = time.time()\n",
    "\n",
    "print(pandas_df.mask(pandas_df < 50))\n",
    "\n",
    "pandas_end = time.time()\n",
    "print(\"pandas mask took {} seconds.\".format(round(pandas_end - pandas_start, 4)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## To request a feature please open an issue: https://github.com/modin-project/modin/issues\n",
    "\n",
    "For a complete list of what is implemented, see the [Supported APIs](https://modin.readthedocs.io/en/latest/supported_apis/index.html) section."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
